{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Lab 19 - Functions and conditional statements\n", "\n", "We will return to the MoMA artwork dataset from Lab 5 for this lab. The data can be downloaded from GitHub here: [Artworks.csv](https://media.githubusercontent.com/media/MuseumofModernArt/collection/master/Artworks.csv) \n", "\n", "More information about the data is available at [https://github.com/MuseumofModernArt/collection](https://github.com/MuseumofModernArt/collection)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "\n", "%matplotlib inline\n", "pd.set_option('display.max_columns', None)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading and cleaning the data\n", "\n", "Load the data into a dataframe called `art`. The date column, which is the date the art was created, uses several different terms to indicate the date is unknown. To replace these with NaN (Pandas' representation for missing data), use the parameter `na_values = [\"Unknown\", \"n.d.\", \"nd\", \"n.d\", \"TBC\",\"TBD\"]` in your `read_csv()` function." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will next drop the rows with missing values in the Gender, Date, and Nationality columns. We don't want to drop rows with missing values in any column because many of the dimensions (last 9 columns) are missing, since they are not applicable." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "art = art.dropna(subset=['Gender','Date', 'Nationality'])\n", "art.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Number of artists for each artwork\n", "\n", "Suppose we want to know how many pieces of art are made by 1 artist, 2 artists, etc. There is no column with this information, but we can get it from the `Gender` column.\n", "\n", "First, use `value_counts()` to show the different possible values in the `Gender` column." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How can you tell from the `Gender` column how many artists (or other entities, like companies) created an artwork?\n", "\n", "We are going to count how many ( are in the `Gender` column in each row and store that number in a new column, representing the number of artists.\n", "\n", "To do the counting, we define a function that outputs (returns) the the number of times a ( appears in the input x:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def count_artists(x):\n", " return x.count(\"(\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can call (run) our function on some sample `Gender` column values to test it out:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "count_artists(\"(Female)\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "count_artists(\"(Male) (Male)\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "count_artists(\"() (Male)\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Does our function look like it works? We can apply it to all rows in the `Gender` column and store the output in a new column called `NumArtists`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "art[\"NumArtists\"] = art[\"Gender\"].apply(count_artists)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display the dataframe `art`. Is your new column there? Check a few of the rows to see if it is correct." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a bar plot of the number of artists for each work." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What about if we wanted to know which pieces of art were made by an American, and exactly how many American artists were involved? \n", "\n", "Try to make a new function called count_Americans that counts how many times `(American)` appears in the input `x`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, apply your function to the `Nationality` column to create a new column called `numAmericans`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display `art` to check that the new column was created correctly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Create a bar plot of the number of Americans for the artworks." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How does the plot change from the number of artists one?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Grouping artwork by date\n", "\n", "Suppose we want to assign each piece of art to Pre-Modern (created before 1860), Modern (created 1860 - 1845), or Contemporary (created after 1945).\n", "\n", "We can make a function that takes in the date and outputs the period. Some of the dates are given as ranges or as c. 1900, so we will just assign those to the period \"Range\" (for range of dates)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def period(x):\n", " if len(x) > 4:\n", " return \"Range\"\n", " elif int(x) < 1860:\n", " return \"Pre-Modern\"\n", " elif int(x) <= 1945:\n", " return \"Modern\"\n", " else:\n", " return \"Contemporary\"" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Can you apply this function to the column `Date` to create a new column `Period`?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Display `art` to check that the column `Period` was created correctly." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now make a bar chart of the distribution of art period." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Challenger: Area of round objects\n", "Some of the artwork have a diameter. Create a new dataframe of only objects with a value in this column by dropping the rows with missing values. Assuming these objects are all 2D (a big assumption!), let's create a new column with the area of this art. Remember that:\n", "\n", "area = $\\pi$(diameter/2)^2\n", "\n", "You can get the value of $\\pi$ with `np.pi`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }